A Music Structure Inference Algorithm Based on Morphological Analysis
نویسندگان
چکیده
Music structure refers to the description of the long term organization of a music piece through a sequence of structural segments. A structural segment can be defined by its structural borders (a start time, an end time) and a label reflecting the similarity of its music content compared to the other segments’. Its duration is typically around 16 s and more. This document presents the music structure estimation system submitted to MIREX’s structural segmentation task in 2012. It is composed of three steps : feature extraction, structural border estimation and segment labeling. First, the system produces a sequence of chroma vectors [6] expressed at the snap scale [1] (section 1). This sequence is used to calculate a segmentation criterion based on a morphological model of the structural segments [2] (section 2.1). The structural border estimation is performed by searching the segmentation with lowest cost, which combines this criterion and a regularity constraint (section 2.2). The segments are then labeled by clustering according to their similarity, through the minimization of an adaptive model selection criterion (section 3). 1. FEATURE EXTRACTION The extraction of the sequence of chroma vectors of size 12 used to describe the music content of the piece is performed by means of the “Chroma Toolbox” by Muller and Ewert [6]. We use the CP features regularly and a hop of 0.1 s. Then, they are expressed at the snap scale. The snap is here defined as the multiple of a beat whose period is closer to 1 s. The snap scale is synchronous to the downbeat scale, and they are often equal in practice. The beat and downbeat estimations are performed thanks to the MATLAB implementation by Davies et al. [4, 5]. The downbeat estimator is tuned so as to consider 4 beats per bar. We associate to each snap the mean of the CP features contained in the window centered on the snap that lasts the This document is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. http://creativecommons.org/licenses/by-nc-sa/3.0/ c © 2012 The Authors. duration of the snap period. 2. STRUCTURAL BORDER ESTIMATION 2.1 Morphological model We assume that a structural segment can be characterized by its inner organization, according to its musical layers (timbre, harmony, melody ...). In this scope we consider the system and contrast model by Bimbot et al. [2]. It considers that each structural segment aimed is built from an group of typically four morphological elements of four snaps, we note {a1, a2, a3, a4}. The three first elements are related by simple transformations f and g so as a2 = f(a1) and a3 = g(a1). The fourth element can either follow the logic of the three elements and then form a system (a4 = f(g(a1))) or on the contrary contrasts with it (a4 = δ(f(g(a1))) where δ 6= id). Note that we assume that the relevant layers for structural analysis can vary from one structural segment to another. However, in much cases, either f = id or g = id, or both. This leads to observe usual morphological motives like aaaa, abab, aabb in the case of systems with no contrast, or aaab, abac, aabc in the case of systems ending with a contrast. These motives can be extended to the case where the identity function id is replaced by “close to identity” functions id′ (aaa′b, aba′c, aa′bc, ...). More information on this model can be found in [3]. 2.2 Segmentation criterion The aim is to evaluate for each time unit considered the likelihood that it corresponds to the beginning of a system. We assume that at least one of the relations (f or/and g) between the elements of a system equals the identity function. For each snap t ∈ [1, T ] of a music piece, we consider the analysis window of size N = 16 snaps so as to consider three morphological elements starting from t (a1, a2, a3), and one morphological element before this snap (a0) as represented in figure 1. We consider that the size of each morphological element is Nm = 4 snaps. The criterion Φ we consider in this work results from the linear combination of two quantities : Φ = λ1σSystem + λ2σContrast (1) ha l-0 07 27 79 1, v er si on 2 22 O ct 2 01 2 Author manuscript, published in "The Music Information Retrieval Evaluation eXchange (MIREX), ISMIR 2012, Porto : Portugal (2012)"
منابع مشابه
Robust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کاملAdaptive Network-based Fuzzy Inference System-Genetic Algorithm Models for Prediction Groundwater Quality Indices: a GIS-based Analysis
The prediction of groundwater quality is very important for the management of water resources and environmental activities. The present study has integrated a number of methods such as Geographic Information Systems (GIS) and Artificial Intelligence (AI) methodologies to predict groundwater quality in Kerman plain (including HCO3-, concentrations and Electrical Conductivity (EC) of groundwater)...
متن کاملDouble Fuzzy Implications-Based Restriction Inference Algorithm
The main condition of the differently implicational inferencealgorithm is reconsidered from a contrary direction, which motivatesa new fuzzy inference strategy, called the double fuzzyimplications-based restriction inference algorithm. New restrictioninference principle is proposed, which improves the principle of thefull implication restriction inference algorithm. Furthermore,focusing on the ...
متن کاملA Real Time Adaptive Multiresolution Adaptive Wiener Filter Based On Adaptive Neuro-Fuzzy Inference System And Fuzzy evaluation
In this paper, a real-time denoising filter based on modelling of stable hybrid models is presented. Thehybrid models are composed of the shearlet filter and the adaptive Wiener filter in different forms.The optimization of various models is accomplished by the genetic algorithm. Next, regarding thesignificant relationship between Optimal models and input images, changing the structure of Optim...
متن کاملکاربرد الگوریتم جداسازی کور منابع در جداسازی سیگنالهای گفتار و موسیقی
In this paper, the application of the Independent Component Analysis In this paper, the application of the Independent Component Analysis technique in speech-music separation is discussed. The separation algorithm is in the time domain. It needs the score function estimation to minimize the mutual information. For estimating score function, sufficient samples of the mixed (speech-music) signals...
متن کاملFunctional Brain Response to Emotional Muical Stimuli in Depression, Using INLA Approach for Approximate Bayesian Inference
Introduction: One of the vital skills which has an impact on emotional health and well-being is the regulation of emotions. In recent years, the neural basis of this process has been considered widely. One of the powerful tools for eliciting and regulating emotion is music. The Anterior Cingulate Cortex (ACC) is part of the emotional neural circuitry involved in Major Depressive Disorder (MDD)....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012